-
Notifications
You must be signed in to change notification settings - Fork 71
🐛 Workload should still resilient when catalog is deleted #2439
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
🐛 Workload should still resilient when catalog is deleted #2439
Conversation
✅ Deploy Preview for olmv1 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds comprehensive end-to-end tests to verify that installed OLM extensions continue functioning correctly when their source catalog is deleted. The tests cover both standard runtime and experimental Boxcutter runtime scenarios.
Changes:
- Added new feature file with 8 scenarios testing catalog deletion resilience
- Implemented
CatalogIsDeletedfunction to support catalog deletion in tests - Added step registrations for ClusterExtension update operations
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| test/e2e/steps/steps.go | Adds CatalogIsDeleted function and step registrations for testing catalog deletion and ClusterExtension updates |
| test/e2e/features/catalog-deletion-resilience.feature | Defines 8 test scenarios covering extension resilience, resource restoration, config changes, version upgrades, and revision behavior when catalog is deleted |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d3cbb5a to
f31b184
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
f31b184 to
dce6d68
Compare
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Show resolved
Hide resolved
dce6d68 to
b15c262
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b15c262 to
b1d259e
Compare
b1d259e to
c6870c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Show resolved
Hide resolved
c6870c5 to
36e9069
Compare
36e9069 to
6799025
Compare
|
All feedbacks are addressed. Please, feel free to check it out. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
008a58e to
7c23624
Compare
|
I wouldn't have expected any code changes to the Appliers, only to the resolution step since we already have the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
internal/operator-controller/controllers/clusterextension_reconcile_steps.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 14 out of 14 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
internal/operator-controller/controllers/clusterextension_admission_test.go
Show resolved
Hide resolved
e6fb69f to
c554d3d
Compare
|
Hi @perdasilva Thx for looking that.
Resolution can decide "keep what’s installed", but if the catalog/registry is gone we can’t unpack/render the bundle content anymore. So, the changes in the Applier are required because |
c554d3d to
7264c2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 15 out of 15 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
/approve Approved to go in, but there are still updates that need to happen |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tmshort The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
7264c2a to
669fed0
Compare
Assisted-by: Cursor
Enables installed extensions to continue working when their source catalog becomes unavailable or is deleted. When resolution fails due to catalog unavailability, the operator now continues reconciling with the currently installed bundle instead of failing. Changes: - Resolution falls back to installed bundle when catalog unavailable - Unpacking skipped when maintaining current installed state - Helm and Boxcutter appliers handle nil contentFS gracefully - Version upgrades properly blocked without catalog access This ensures workloads remain stable and operational even when the catalog they were installed from is temporarily unavailable or deleted, while appropriately preventing version changes that require catalog access. Assisted-by: Cursor
669fed0 to
d1759f0
Compare
Problem
When a catalog becomes unavailable (deleted, registry offline, network issues), installed extensions break or stop being maintained. This PR ensures extensions continue working with their installed version until the catalog becomes available again.
What This Fixes
Issues on main when catalog is unavailable/deleted:
Note: Boxcutter already maintains resources via CER controller; Helm did not.
Solution
Added smart fallback logic:
Key Changes
reconcileExistingRelease()to maintain resources whencontentFS == nilcontentFS == nil(CER controller maintains)What "Extension Continue Working" Means
An extension continues working when:
Installed=TrueTesting
Added comprehensive e2e test suite in
test/e2e/features/catalog-deletion-resilience.feature:All scenarios tested for both Helm and Boxcutter runtimes where applicable.
What Still Requires Catalog (Correct Behavior)
Resolution Fails?
TL;DR Reconcile Workflow and Scenarios
Step 1: Resolution + rollout succeed (healthy)
Step 2: Resolution succeeds, rollout starts, rollout fails partway
Step 3: Catalog missing; resolution would fail, but we skip it
What happens:
Result:
Why this makes sense:
When does fallback happen?
Fallback to Installed only happens when:
In this scenario RollingOut is populated, so fallback never triggers.
/hold until we have a RFC approved
Closes: #209